Parallel Data Mining on ATM-Connected PC Cluster and Optimization of Its Execution Environments
نویسندگان
چکیده
In this paper, we have constructed a large scale ATM-connected PC cluster consists of 100 PCs, implemented a data mining application, and optimized its execution environment. Default parameters of TCP retransmission mechanism cannot provide good performance for data mining application, since a lot of collisions occur in the case of all-to-all multicasting in the large scale PC cluster. Using a TCP retransmission parameters according to the proposed parameter optimization, reasonably good performance improvement is achieved for parallel data mining on 100 PCs. Association rule mining, one of the best-known problems in data mining, di ers from conventional scienti c calculations in its usage of main memory. We have investigated the feasibility of using available memory on remote nodes as a swap area when working nodes need to swap out their real memory contents. According to the experimental results on our PC cluster, the proposed method is expected to be considerably better than using hard disks as a swapping device.
منابع مشابه
Implementation and Evaluation of Parallel Data Mining on PC Cluster and Optimization of its Execution Environments
Personal Computer/Workstation clusters have been studied intensively in the field of parallel and distributed computing. In the viewpoint of applications, data intensive applications such as data mining and ad-hoc query processing in databases are considered very important for high performance computing, as well as conventional scientific calculations. We have built and evaluated PC cluster pil...
متن کاملOptimizing Protocol Parameters to Large Scale PC Cluster and Evaluation of its Effectiveness with Parallel Data Mining
Recently, PC clusters have come to be studied intensively, for a large scale parallel computer in the next generation. ATM technology is a strong candidate as a de facto standard of high speed communication networks. Therefore an ATM connected PC cluster is very promising platform from the cost/performance point of view, as a future high performance computing environment. In this paper, an ATM ...
متن کاملPreliminary Experimental Results of a Parallel Association Rule Mining on ATM Connected PC Clusters
Until recently, workstations were overwhelmingly superior to personal computers in terms of performance. However, recent PC technology has dramatically increased its CPU, main memory, and cache memory performance. Therefore massively parallel computer systems are moving away from proprietary components such as CPU, disks, etc. to commodity parts. As far as applications are concerned, we believe...
متن کاملUsing Available Remote Memory Dynamically for Parallel Data Mining Application on ATM-Connected PC Cluster
Personal computer/Workstation (PC/WS) clusters are promising candidates for future high performance computers, because of their good scalability and cost performance ratio. Data intensive applications, such as data mining and ad hoc query processing in databases, are considered very important for massively parallel processors, as well as conventional scientific calculations. Thus, investigating...
متن کاملData mining on PC cluster connected with storage area network: its preliminary experimental results
Personal computer/Workstation (PC/WS) clusters have become a hot research topic recently in the field of parallel and distributed computing. They are considered to play an important role as a large scale computer system, such as large server sites and/or high performance parallel computers, because of their good scalability and cost performance ratio. In the viewpoint of applications, data inte...
متن کامل